Goto

Collaborating Authors

 power posterior


Amortized Simulation-Based Inference in Generalized Bayes via Neural Posterior Estimation

arXiv.org Machine Learning

Generalized Bayesian Inference (GBI) tempers a loss with a temperature $ฮฒ>0$ to mitigate overconfidence and improve robustness under model misspecification, but existing GBI methods typically rely on costly MCMC or SDE-based samplers and must be re-run for each new dataset and each $ฮฒ$ value. We give the first fully amortized variational approximation to the tempered posterior family $p_ฮฒ(ฮธ\mid x) \propto ฯ€(ฮธ)\,p(x \mid ฮธ)^ฮฒ$ by training a single $(x,ฮฒ)$-conditioned neural posterior estimator $q_ฯ•(ฮธ\mid x,ฮฒ)$ that enables sampling in a single forward pass, without simulator calls or inference-time MCMC. We introduce two complementary training routes: (i) synthesize off-manifold samples $(ฮธ,x) \sim ฯ€(ฮธ)\,p(x \mid ฮธ)^ฮฒ$ and (ii) reweight a fixed base dataset $ฯ€(ฮธ)\,p(x \mid ฮธ)$ using self-normalized importance sampling (SNIS). We show that the SNIS-weighted objective provides a consistent forward-KL fit to the tempered posterior with finite weight variance. Across four standard simulation-based inference (SBI) benchmarks, including the chaotic Lorenz-96 system, our $ฮฒ$-amortized estimator achieves competitive posterior approximations in standard two-sample metrics, matching non-amortized MCMC-based power-posterior samplers over a wide range of temperatures.


Simulating normalising constants with referenced thermodynamic integration: application to COVID-19 model selection

arXiv.org Machine Learning

Model selection is a fundamental part of Bayesian statistical inference; a widely used tool in the field of epidemiology. Simple methods such as Akaike Information Criterion are commonly used but they do not incorporate the uncertainty of the model's parameters, which can give misleading choices when comparing models with similar fit to the data. One approach to model selection in a more rigorous way that uses the full posterior distributions of the models is to compute the ratio of the normalising constants (or model evidence), known as Bayes factors. These normalising constants integrate the posterior distribution over all parameters and balance over and under fitting. However, normalising constants often come in the form of intractable, high-dimensional integrals, therefore special probabilistic techniques need to be applied to correctly estimate the Bayes factors. One such method is thermodynamic integration (TI), which can be used to estimate the ratio of two models' evidence by integrating over a continuous path between the two un-normalised densities. In this paper we introduce a variation of the TI method, here referred to as referenced TI, which computes a single model's evidence in an efficient way by using a reference density such as a multivariate normal - where the normalising constant is known. We show that referenced TI, an asymptotically exact Monte Carlo method of calculating the normalising constant of a single model, in practice converges to the correct result much faster than other competing approaches such as the method of power posteriors. We illustrate the implementation of the algorithm on informative 1- and 2-dimensional examples, and apply it to a popular linear regression problem, and use it to select parameters for a model of the COVID-19 epidemic in South Korea.


Semi-Modular Inference: enhanced learning in multi-modular models by tempering the influence of components

arXiv.org Machine Learning

Bayesian statistical inference loses predictive optimality when generative models are misspecified. Working within an existing coherent loss-based generalisation of Bayesian inference, we show existing Modular/Cut-model inference is coherent, and write down a new family of Semi-Modular Inference (SMI) schemes, indexed by an influence parameter, with Bayesian inference and Cut-models as special cases. We give a meta-learning criterion and estimation procedure to choose the inference scheme. This returns Bayesian inference when there is no misspecification. The framework applies naturally to Multi-modular models. Cut-model inference allows directed information flow from well-specified modules to misspecified modules, but not vice versa. An existing alternative power posterior method gives tunable but undirected control of information flow, improving prediction in some settings. In contrast, SMI allows tunable and directed information flow between modules. We illustrate our methods on two standard test cases from the literature and a motivating archaeological data set.


Sampling for Bayesian Mixture Models: MCMC with Polynomial-Time Mixing

arXiv.org Machine Learning

Various researchers have studied posterior inference of parameters in Bayesian mixture models [24, 42, 23], so that the statistical behavior of such models is relatively well-understood. In contrast, much less is known about the efficiency of different algorithms for sampling from the posterior distributions that arise from Bayesian mixture models. A standard approach for doing so is via some form of Markov Chain Monte Carlo (MCMC). Many different types of MCMC algorithms have been introduced for various types of Bayesian mixture models, including finite Bayesian mixture models [21, 49, 50, 26, 40], Dirichlet process mixture models [37, 41, 25, 28], and hierarchical and nested Dirichlet process models [52, 47]. Despite the plethora of possible MCMC methods, upper bounds on their mixing times are often challenging to establish. We refer the reader to the papers [27, 3, 55, 48, 57] for non-asymptotic upper bounds on mixing times for certain types of Bayesian models, different from those studied in this paper. In recent years, it has been increasingly common in the Bayesian literature to make use of a fractional likelihood--meaning an ordinary likelihood raised to some fractional power. Combining such a fractional likelihood with a prior distribution in the usual way leads to a class of posteriors known as power posterior or fractional posterior distributions. The power posterior distributions have been shown to have attractive properties in terms of robustness to mis-specification in Bayesian mixture models [39], and have been used in various applications 1 arXiv:1912.05153v1


Challenges in Bayesian inference via Markov chain Monte Carlo for neural networks

arXiv.org Machine Learning

Markov chain Monte Carlo (MCMC) methods and neural networks are instrumental in tackling inferential and prediction problems. However, Bayesian inference based on joint use of MCMC methods and of neural networks is limited. This paper reviews the main challenges posed by neural networks to MCMC developments, including lack of parameter identifiability due to weight symmetries, prior specification effects, and consequently high computational cost and convergence failure. Population and manifold MCMC algorithms are combined to demonstrate these challenges via multilayer perceptron (MLP) examples and to develop case studies for assessing the capacity of approximate inference methods to uncover the posterior covariance of neural network parameters. Some of these challenges, such as high computational cost arising from the application of neural networks to big data and parameter identifiability arising from weight symmetries, stimulate research towards more scalable approximate MCMC methods or towards MCMC methods in reduced parameter spaces.